Introduction / Motivation
This vignette will show why VitalDBR is useful, if you want to do statistical analysis on data from VitalDB. The vignette is split into different parts. First of all we want to prepare the data for analysis. The data we are interested in is the Arterial Blood Pressure (ART) and the Airway Pressure (AWP). We subset filter and detect events for this data through signal processing. After that we show how to use the processed data and detected events to fit a Generalized Additive Model (GAM) and extract the pulse pressure variation. In the end we extend the use case to a novel investigation into the importance of the Pulse Pressure Variation (PPV) on whether patient’s time spent in the ICU (intensive care unit). To do this we fit a GAM for every other minute during operation and see the proportion of those outside the interval $ 5 <= PPV <= 8$. When this dataset is built, we use the results to do a poisson regression regression with the icu-days as the dependent variable. Afterwards we compare the simple poisson regression to a zero-inflated poisson regression and a pseudo compound poisson regression. We of course include other important features like age and BMI, and finally we discuss what confounding variables would have been relevant to include.
Fitting a gam
Preparing everything
Before we get started, we need a couple of packages.
<<<<<<< HEAD# This is how you install VitalDBR!
library(devtools)
install_github('legendenomgeorg/VitalDBR/VitalDBR')
library(VitalDBR)
library(comprehenr)library(waveformtools) # This is a package made by our advisor PhD. student Johannes
library("tidyverse")
library(mgcv) # The standard R package to use GAM's
library(patchwork) #package used for plotting
library()# This is how you install VitalDBR!
library(devtools)
install_github('legendenomgeorg/VitalDBR/VitalDBR')
library(VitalDBR)library(waveformtools) # This is a package made by our advisor PhD. student Johannes
library("tidyverse")
library(mgcv) # The standard R package to use GAM's
library(patchwork) #package used for plottingThen we need to load some data. For that we have created the function “load_case”, which takes in a monitoring machine, the track of interest and finally the caseid.
data_art <- VitalDBR::load_case('SNUADC/ART', 1)
data_awp <- VitalDBR::load_case('Primus/AWP', 1)In this case we load the tracks ART (Arterial Blood Pressure) and AWP (Airway Pressure) for patient 1. ART is measured by the “SNUADC” machine and AWP is measured by the “Primus” machine. Under the hood this function finds all tracks with the inputted caseid and then stiches together the right API call such that we get the the right data. The format of the data also makes it easy to convert to a timeseries format as the frequency is saved in the first row of the first column, like this:
tsp(ts(data_art[,2], frequency = 1/data_art[1,1])) # Output is first value, last value and frequency## [1] 1.00 11541.31 500.00
But we will continue working in the data.frame format for this vignette.
To make sure that we work with the same data through out the entire vignette, we start out by defining an interval we want to work on
start = 10000 # Interval starts at 10000 seconds
sec = 30 # Last 30 seconds from 10000. Such that the interval is now 10000 sec to 10030 secFinding the inspiration of AWP
Now we introduce our function: “subset_data”. This function helps with choosing an interval of a timeseries defined by frequency. It automatically sets converts frequency to seconds and subsets the data. The reason the time variable is not converted to the row index, is that it makes it difficult to work with.
<<<<<<< HEAD =======sub_awp <- VitalDBR::subset_data(data = data_awp, seconds = sec, start_sec = start)
knitr::kable(head(sub_awp))| Time | Primus.AWP |
|---|---|
| 10000.00 | 7.3664 |
| 10000.02 | 7.3664 |
| 10000.03 | 7.3664 |
| 10000.05 | 7.3664 |
| 10000.06 | 7.3664 |
| 10000.08 | 7.3664 |
We now have the right subset of our AWP data, and we can simply plot it like so:
<<<<<<< HEADNow, as stated in the introduction, we are interested in finding the inspiration of the airway pressure, which is the beginning of a breath. For that we have created the function “get_inspiration_start”. By default, it works by applying a convolution filter of the form (-1,-1,-1,-1,-1,-1,-1,-1,0,1,1,1,1,1,1,1,1) (note: stats::filter takes the convolution filter in reverse, so we input the reverse of this). This creates a waveform which peaks at sudden large changes in amplitude. This means that we can use a standard peak-finding algorithm on the convolution, to find the inspirations.
=======ggplot(sub_awp, aes(Time, Primus.AWP)) +
geom_line() Now, as stated in the introduction, we are interested in finding the inspiration of the airway pressure, which is the beginning of a breath. For that we have created the function “get_inspiration_start”. It works by applying a convolution filter of the form (-1,-1,-1,-1,-1,-1,-1,-1,0,1,1,1,1,1,1,1,1) (note: stats::filter takes the convolution filter in reverse, so we input the reverse of this). This creates a waveform which peaks at sudden large changes in amplitude. This means that we can use a standard peak-finding algorithm on the convolution, to find the inspirations.
insp_start <- VitalDBR::get_inspiration_start(sub_awp)
knitr::kable(head(insp_start))| time |
|---|
| 10001.65 |
| 10005.94 |
| 10010.22 |
| 10014.51 |
| 10018.78 |
| 10023.09 |
The function returns the times of the peaks in a dataframe And we can then plot the inspirations on the AWP signal:
<<<<<<< HEADggplot(sub_awp, aes(Time, Primus.AWP)) +
geom_line() +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start)Finding the diastolic and systolic peaks of the Arterial Blood Pressure (ART)
To find the diastolic and systolic peaks of the Arterial Blood Pressure (ART) we first subset the data, to the same interval as before. But the noise in the ART data can disturb our ability to find the peaks properly. Luckily you can give “subset_data” an optional argument called “filter”. If the filter argument is set to TRUE, it applies a Butterworth filter to the data. The result can be seen if you zoom into the plot below (interactively!). We can also set the cutoff frequency for the filter with the argument “cut_freq” (default is 25).
<<<<<<< HEAD| Time | SNUADC.ART | SNUADC.ART_filt |
|---|---|---|
| 10000.00 | 103.760 | 58.23638 |
| 10000.00 | 104.748 | 69.50366 |
| 10000.00 | 104.748 | 79.71424 |
| 10000.01 | 105.735 | 88.46599 |
| 10000.01 | 105.735 | 95.61850 |
| 10000.01 | 105.735 | 101.21007 |
Now that the data is prepared we use a function created by our advisor Johannes, thats finds the value and position of the peaks. It also finds the pulse pressure, which is just the difference between the systolic and diastolic blood pressure: $ PP = SBP - DBP $
## Sample rate not set. Returning beat length in samples
=======
sub_art <- VitalDBR::subset_data(data = data_art, seconds = sec, start_sec = start, filter=TRUE, cut_freq = 25)
dygraph_signal(sub_art, 2, 3)Now that the data is prepared we use a function created by our advisor Johannes, thats finds the value and position of the peaks. It also finds the pulse pressure, which is just the difference between the systolic and diastolic blood pressure: $ PP = SBP - DBP $
beats <- waveformtools::find_abp_beats(sub_art,abp_col=3,time_col=1)[-1,] # we skip the first observation## Sample rate not set. Returning beat length in samples
knitr::kable(head(beats))| time | dia | sys | PP | beat_len | time_systole | dia_pos | sys_pos | .noise_wiggliness | .noise_pos_after_sys |
|---|---|---|---|---|---|---|---|---|---|
| 10001.33 | 51.31872 | 110.6373 | 59.31856 | 365 | 10001.51 | 667 | 754 | 0.1605111 | NA |
| 10002.06 | 50.11974 | 108.6104 | 58.49065 | 363 | 10002.23 | 1032 | 1115 | 0.1429079 | NA |
| 10002.79 | 49.85253 | 108.3100 | 58.45746 | 362 | 10002.97 | 1395 | 1484 | 0.1512188 | NA |
| 10003.51 | 49.73168 | 109.7397 | 60.00801 | 362 | 10003.69 | 1757 | 1845 | 0.1516818 | NA |
| 10004.24 | 51.22496 | 111.3532 | 60.12822 | 366 | 10004.41 | 2119 | 2208 | 0.1582818 | NA |
| 10004.97 | 52.26622 | 112.2327 | 59.96645 | 365 | 10005.14 | 2485 | 2573 | 0.1810274 | NA |
Now we plot the Arterial blood pressure, together with the peaks and the inspiration:
<<<<<<< HEAD Beautiful, right?
We can also plot the pulse pressure with the inspiration. NOTE: As the pulse pressure is a discrete measurement, this is plotted with a linear line interpolation.
But we want to know which inspiration every pulse pressure measurement belongs to, and we want to know the position of the pulse pressure measurement relative to the inspiration. To do this we use a function created by our advisor Johannes, that adds “ann_n” which says which inspiration a measurement belongs to. And “ann_rel_index” which gives the position relative to the inspiration (ie. it goes from 0 to 1 between each inspiration)
=======ggplot(sub_art, aes(Time, SNUADC.ART)) +
geom_line() +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start)+
geom_point(aes(x = time,
y = dia,
colour = "diastole"),
data = beats) +
geom_point(aes(x = time_systole,
y = sys,
colour = "systole"),
data = beats) Beautiful, right?
We can also plot the pulse pressure with the inspiration. NOTE: As the pulse pressure is a discrete measurement, this is plotted with a linear line interpolation.
pp_plot <- ggplot(beats, aes(time, PP)) +
geom_line() +
geom_point() +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start)
pp_plotBut we want to know which inspiration every pulse pressure measurement belongs to, and we want to know the position of the pulse pressure measurement relative to the inspiration. To do this we use a function created by our advisor Johannes, that adds “ann_n” which says which inspiration a measurement belongs to. And “ann_rel_index” which gives the position relative to the inspiration (ie. it goes from 0 to 1 between each inspiration)
beats_indexed <- waveformtools::add_time_since_event(beats, time_event = insp_start$time)
knitr::kable(head(beats_indexed))| time | dia | sys | PP | beat_len | time_systole | dia_pos | sys_pos | .noise_wiggliness | .noise_pos_after_sys | ann_index | ann_n | ann_cycle_len | ann_rel_index |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10001.33 | 51.31872 | 110.6373 | 59.31856 | 365 | 10001.51 | 667 | 754 | 0.1605111 | NA | NA | NA | NA | NA |
| 10002.06 | 50.11974 | 108.6104 | 58.49065 | 363 | 10002.23 | 1032 | 1115 | 0.1429079 | NA | 0.414 | 1 | 4.288 | 0.0965485 |
| 10002.79 | 49.85253 | 108.3100 | 58.45746 | 362 | 10002.97 | 1395 | 1484 | 0.1512188 | NA | 1.140 | 1 | 4.288 | 0.2658582 |
| 10003.51 | 49.73168 | 109.7397 | 60.00801 | 362 | 10003.69 | 1757 | 1845 | 0.1516818 | NA | 1.864 | 1 | 4.288 | 0.4347015 |
| 10004.24 | 51.22496 | 111.3532 | 60.12822 | 366 | 10004.41 | 2119 | 2208 | 0.1582818 | NA | 2.588 | 1 | 4.288 | 0.6035448 |
| 10004.97 | 52.26622 | 112.2327 | 59.96645 | 365 | 10005.14 | 2485 | 2573 | 0.1810274 | NA | 3.320 | 1 | 4.288 | 0.7742537 |
This plot then colors the pulse pressure measurements according to which inspiration they belong to. On the right plot we can also start to see why this is important. It seems that there is a relationship between inspiration and pulse pressure. This is the relationship we want to model with the GAM framework.
<<<<<<< HEAD## Warning: Removed 5 row(s) containing missing values (geom_path).
## Warning: Removed 5 rows containing missing values (geom_point).
pp_plot_color <- ggplot(beats_indexed, aes(time, PP)) +
geom_line() +
# insp_n is a unique (consecutive) number for each respiratory cycle
geom_point(aes(color = as.factor(ann_n)), show.legend = FALSE) +
geom_vline(aes(xintercept = time), color = "red",
data = insp_start) +
labs(title = "Pulse pressure",
subtitle = "by time. Color indicate respiratory cycle")
pp_insp_plot <- ggplot(beats_indexed,
aes(
ann_rel_index,
PP,
group = as.factor(ann_n),
color = as.factor(ann_n)
)
) +
geom_line(alpha = 0.3, show.legend = FALSE) +
# insp_n is a unique (consecutive) number for each respiratory cycle
geom_point(aes(color = as.factor(ann_n)), show.legend = FALSE) +
labs(subtitle = "by position in the respiratory cycle")
pp_plot_color + pp_insp_plotFitting the GAM
We can generally write the GAM structure as: \[g(E(Y))=α+s1(x1)+⋯+sp(xp)\] In our case this would translate to: \[g(E(PP))=α+s1(ann\_rel\_index)+⋯+sp(Time)\]
<<<<<<< HEADTHIS NEEDS TO BE FIXED :D
PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]
PP_gam <- gam(
# The first parameter to the gam() function is the model specification,
# supplied using formula notation:
PP ~ # Left of the tilde (~) is our dependent variable PP
# Right of the tilde is our independent variables.
# Define a smooth function of insp_rel_index.
s(ann_rel_index,
k =10, # 10 knots.
bs = "cc" # The basis is a cyclic cubic spline
) +
# Define a smooth function of time
s(time,
bs = "cr" # The basis is a natural cubic spline.
# default k is 10. This will be fine here.
),
# We can specify the positions of the knots for each smooth.
# If only two knots are specified for a cyclic spline, these will
# set the positions of the limiting knot(s). The remaining knots will
# be positioned automatically (at quantiles).
knots = list(ann_rel_index = c(0,1)),
# We use restricted maximum likelihood (REML) to fit the optimal smoothing parameter.
# This is often the best choice, but not the default.
method = "REML",
data = PP_data
)
gratia::draw(PP_gam,
residuals = TRUE) We need the last part of the right hand side, which is the intercept ( or is it the mean? or is that the same?):
PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]
PP_gam <- gam(
# The first parameter to the gam() function is the model specification,
# supplied using formula notation:
PP ~ # Left of the tilde (~) is our dependent variable PP
# Right of the tilde is our independent variables.
# Define a smooth function of insp_rel_index.
s(ann_rel_index,
k =10, # 10 knots.
bs = "cc" # The basis is a cyclic cubic spline
) +
# Define a smooth function of time
s(time,
bs = "cr" # The basis is a natural cubic spline.
# default k is 10. This will be fine here.
),
# We can specify the positions of the knots for each smooth.
# If only two knots are specified for a cyclic spline, these will
# set the positions of the limiting knot(s). The remaining knots will
# be positioned automatically (at quantiles).
knots = list(ann_rel_index = c(0,1)),
# We use restricted maximum likelihood (REML) to fit the optimal smoothing parameter.
# This is often the best choice, but not the default.
method = "REML",
data = PP_data
)
gratia::draw(PP_gam,
residuals = TRUE) We need the last part of the right hand side, which is the intercept ( or is it the mean? or is that the same?):
alpha <- coef(PP_gam)[1] # intercept
alpha## (Intercept)
## 59.64833
Now we can construct the expected value of Y, the pulse pressure, through adding: Alpha, s(ann_rel_index) and s(time)
Time to calculate PPV
<<<<<<< HEADcalc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
splines <- predict(PP_gam, type = "terms")
PPV <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])
PPVcalc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
splines <- predict(PP_gam, type = "terms")
PPV <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])
PPV## [1] 0.05500996
Association study - does PPV influence survival of patients?
This part of the vignette revolves around utilizing what we have previously done, to make a large scale association study to understand whether the Pulse Pressure Variation (PPV) during an operation influences the outcome.
Creating the dataset
To create the data set we need to calculate the PPV for every other minute of every operation… That is a lot of calculation! Therefore we have added the final dataset to the repo, such that you do not have to spend sleepless nights hoping that R doesn’t crash during it’s 1147th iteration. But before we can calculate the PPV we need to restrict our dataset to the following criteria: - The surgery approach should be “open” - The department should be “General surgery” - The type of anasthesia should be “General” (These restrictions were decided by our advisers) We do that as such:
<<<<<<< HEADcases <- VitalDBR::load_VDB("https://api.vitaldb.net/cases") %>%
dplyr::filter(approach=="Open",
department=="General surgery",
ane_type=="General") %>%
dplyr::select(caseid, death_inhosp, icu_days, age, sex, asa, emop, bmi, opstart, opend)
knitr::kable(head(cases))cases <- VitalDBR::load_VDB("https://api.vitaldb.net/cases") %>%
dplyr::filter(approach=="Open",
department=="General surgery",
ane_type=="General") %>%
dplyr::select(caseid, death_inhosp, icu_days, age, sex, asa, emop, bmi, opstart, opend)
knitr::kable(head(cases))| caseid | death_inhosp | icu_days | age | sex | asa | emop | bmi | opstart | opend |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 77 | M | 2 | 0 | 26.3 | 1668 | 10368 |
| 2 | 0 | 0 | 54 | M | 2 | 0 | 19.6 | 1721 | 14621 |
| 5 | 0 | 13 | 66 | M | 3 | 1 | 20.4 | 2591 | 20291 |
| 8 | 0 | 0 | 81 | F | 2 | 0 | 27.4 | 752 | 4952 |
| 12 | 0 | 4 | 46 | F | 2 | 0 | 28.4 | 5360 | 30860 |
| 16 | 0 | 0 | 57 | M | 2 | 0 | 26.2 | 2575 | 12175 |
As we know from the first part of this vignette, we need both the Arterial Blood Pressure (ART) and the Airway Pressure (AWP) to calculate the GAM. Therefore we also restrict our data to cases which have both sensors like so:
<<<<<<< HEADtracks <- VitalDBR::load_VDB("https://api.vitaldb.net/trks") %>%
dplyr::filter(tname == "Primus/AWP" | tname == "SNUADC/ART") %>%
count(caseid) %>%
dplyr::filter(n == 2)
knitr::kable(head(tracks))tracks <- VitalDBR::load_VDB("https://api.vitaldb.net/trks") %>%
dplyr::filter(tname == "Primus/AWP" | tname == "SNUADC/ART") %>%
count(caseid) %>%
dplyr::filter(n == 2)
knitr::kable(head(tracks))| caseid | n |
|---|---|
| 1 | 2 |
| 3 | 2 |
| 4 | 2 |
| 7 | 2 |
| 10 | 2 |
| 12 | 2 |
We then join those two datasets together by their case id, which results in the final dataset:
<<<<<<< HEADmerged <- merge(x=tracks,y=cases,by="caseid") %>% dplyr::select(-one_of("n"))
knitr::kable(head(merged))merged <- merge(x=tracks,y=cases,by="caseid") %>% dplyr::select(-one_of("n"))
knitr::kable(head(merged))| caseid | death_inhosp | icu_days | age | sex | asa | emop | bmi | opstart | opend |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 77 | M | 2 | 0 | 26.3 | 1668 | 10368 |
| 12 | 0 | 4 | 46 | F | 2 | 0 | 28.4 | 5360 | 30860 |
| 16 | 0 | 0 | 57 | M | 2 | 0 | 26.2 | 2575 | 12175 |
| 17 | 0 | 1 | 85 | M | 2 | 0 | 19.7 | 3540 | 19140 |
| 19 | 0 | 2 | 74 | M | 2 | 0 | 22.6 | 2184 | 26484 |
| 25 | 0 | 1 | 66 | M | 2 | 0 | 27.8 | 1882 | 13882 |
And now for the part where we combine everything we have done so far to create the dataset we need to be able to do statistical analysis on the effects of the PPV on patients.
INDSÆT OVERORDNET FOKLARING AF HVAD FUNKTIONEN GØR
<<<<<<< HEADcalc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
ppv_prepare <- function(case, start, end, data_art, data_awp){
# det tager usandsynligt lang tid, så vi skal have fixet den load funktion....
# men ellers er det bare at implemetere PPV udregningen her
op <- end - start
interval <- 120
iterations <- floor(op/interval)
data <- c(matrix(NA, nrow=iterations))
skip_to_next <- FALSE
for (i in 0:iterations){
try_catch <- tryCatch(
{
sub_awp <- VitalDBR::subset_data(data = data_awp, seconds = interval, start_sec = start+(interval*i))
insp_start <- VitalDBR::get_inspiration_start(sub_awp)
sub_art <- VitalDBR::subset_data(data = data_art, seconds = interval, start_sec = start+(interval*i), filter=TRUE, cut_freq = 25)
beats <- find_abp_beats(sub_art, abp_col=3, time_col=1)
beats_indexed <- waveformtools::add_time_since_event(beats, time_event = insp_start$time)
rm(beats)
PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]
PP_gam <- gam(
PP ~
s(ann_rel_index,k = 10, bs = "cc" ) + s(time, k = 10, bs = "cr"), # splines
knots = list(ann_rel_index = c(0,1)), method = "REML", data = PP_data )
splines <- predict(PP_gam, type = "terms")
data[i] <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])*100
cat("Iteration",i, "out of ", iterations,". For case:",case,"\n")
rm(PP_data)
},
error = function(e){
skip_to_next <<- TRUE
}
)
if(skip_to_next) {
data[i] <- NA
next
}
}
return(data)
}
process_cases <- function(data){
ppv_under5 <- data.frame(matrix(NA, nrow = nrow(data)))
ppv_over8 <- data.frame(matrix(NA, nrow = nrow(data)))
counter <- 0
break_and_save <- FALSE
for (caseid in data$caseid){
counter <- counter + 1
closeAllConnections()
try_catch <- tryCatch(
{
cat("Importing ART for case:",caseid,"\n")
art <- VitalDBR::load_case('SNUADC/ART', caseid)
cat("Importing AWP for case:",caseid,"\n")
awp <- VitalDBR::load_case('Primus/AWP', caseid)
},
error = function(e){
break_and_save <<- TRUE
})
if(break_and_save) {
cat("Something went wrong when loading case:", caseid,"Saving results so far", "\n" )
break
}
start <- data$opstart[data$caseid==caseid]
end <- data$opend[data$caseid==caseid]
ppv_results <- na.omit(ppv_prepare(caseid, start, end, art,awp))
rm(art)
rm(awp)
len_ppv <- length(ppv_results)
ppv_under5[counter, 1] <- sum(ppv_results<=5)/len_ppv
ppv_over8[counter, 1] <- sum(ppv_results>=8)/len_ppv
}
colnames(ppv_under5) <- c('ppv_under5')
colnames(ppv_over8) <- c('ppv_over8')
data <- cbind(data, ppv_under5)
data <- cbind(data, ppv_over8)
return(data)
}
#ppv_data <- process_cases(merged)calc_PPV <- function(smooth, intercept) {
min_PP <- min(smooth)
max_PP <- max(smooth)
(max_PP - min_PP) / unname(intercept)
}
ppv_prepare <- function(case, start, end, data_art, data_awp){
# det tager usandsynligt lang tid, så vi skal have fixet den load funktion....
# men ellers er det bare at implemetere PPV udregningen her
op <- end - start
interval <- 120
iterations <- floor(op/interval)
data <- c(matrix(NA, nrow=iterations))
skip_to_next <- FALSE
for (i in 0:iterations){
try_catch <- tryCatch(
{
sub_awp <- VitalDBR::subset_data(data = data_awp, seconds = interval, start_sec = start+(interval*i))
insp_start <- VitalDBR::get_inspiration_start(sub_awp)
sub_art <- VitalDBR::subset_data(data = data_art, seconds = interval, start_sec = start+(interval*i), filter=TRUE, cut_freq = 25)
beats <- find_abp_beats(sub_art, abp_col=3, time_col=1)
beats_indexed <- waveformtools::add_time_since_event(beats, time_event = insp_start$time)
rm(beats)
PP_data <- beats_indexed[,c("PP","time","ann_rel_index")]
PP_gam <- gam(
PP ~
s(ann_rel_index,k = 10, bs = "cc" ) + s(time, k = 10, bs = "cr"), # splines
knots = list(ann_rel_index = c(0,1)), method = "REML", data = PP_data )
splines <- predict(PP_gam, type = "terms")
data[i] <- calc_PPV(splines[,1], intercept = coef(PP_gam)[1])*100
cat("Iteration",i, "out of ", iterations,". For case:",case,"\n")
rm(PP_data)
},
error = function(e){
skip_to_next <<- TRUE
}
)
if(skip_to_next) {
data[i] <- NA
next
}
}
return(data)
}
process_cases <- function(data){
ppv_under5 <- data.frame(matrix(NA, nrow = nrow(data)))
ppv_over8 <- data.frame(matrix(NA, nrow = nrow(data)))
counter <- 0
break_and_save <- FALSE
for (caseid in data$caseid){
counter <- counter + 1
closeAllConnections()
try_catch <- tryCatch(
{
cat("Importing ART for case:",caseid,"\n")
art <- VitalDBR::load_case('SNUADC/ART', caseid)
cat("Importing AWP for case:",caseid,"\n")
awp <- VitalDBR::load_case('Primus/AWP', caseid)
},
error = function(e){
break_and_save <<- TRUE
})
if(break_and_save) {
cat("Something went wrong when loading case:", caseid,"Saving results so far", "\n" )
break
}
start <- data$opstart[data$caseid==caseid]
end <- data$opend[data$caseid==caseid]
ppv_results <- na.omit(ppv_prepare(caseid, start, end, art,awp))
rm(art)
rm(awp)
len_ppv <- length(ppv_results)
ppv_under5[counter, 1] <- sum(ppv_results<=5)/len_ppv
ppv_over8[counter, 1] <- sum(ppv_results>=8)/len_ppv
}
colnames(ppv_under5) <- c('ppv_under5')
colnames(ppv_over8) <- c('ppv_over8')
data <- cbind(data, ppv_under5)
data <- cbind(data, ppv_over8)
return(data)
}
#ppv_data <- process_cases(merged)Notes:
This is a very computationally intensive function. It took almost 16 hours to caculate the PPV aggregates for all 1360 operations. Running a profiler revealed that the function “find_abp_beats” was the main culprit, and that over 80% of the time was spent in that function. Future work would preferably optimize that function. From our first iteration of the function to the end result we decreased the amount of RAM and cpu needed substantially, both by utilizing R’s vectorization, and also managing our RAM during the algorithm, ensuring that we remove variables from RAM manually, as soon as they were not needed anymore.
A catch-it-all approach was used two places in this function. Due to data-irregularity of some of the cases, we decided to put all calculations in one big try-except clause. We initially spent a long time trying to account for all edge cases, but in the end we decided that if an interval of the timeseries was of very bad data-quality, we would rather set it to NA, than keep trying to adapt our function around it. This was done as a result of a discussion with our advisors, who suggested, that we are not at all interested in data, if it is of such bad that no “organic process” can have caused the quality decline.
Regression analysis
Now that we have created our dataset, it
TO DO
- Vi skrev I vores projekt beskrivelse at vi ville tilføje funtioner til pakken der visualiserede gams. Det skal vi nok lige have gjort.
- Vi skal nok i store træk have forklaret hvad den store funktion gør. Dette kunne passende være jer andre der skrev det, så I bedre forstår den.
Udforsk betydningen af outcomes NOTER:
Det vi gerne vil nu, er at undersøge betydningen af pulsstrykvariationen på patienters 30 dages mortalitet. Derfor skal vi lave en logistisk regression der ser sådan her ud \[ mortality_i \sim PPV\_thresh_i,art\_mbp_i, Age_i, Sex_i, Asa_i, Eme\_ope_i, BMI_i\] Derfor skal vi bruge et datasæt der ser sådan her ud:
<<<<<<< HEADCase <- c(29,600,1009,1900)
mortality <- c(0,1,1,0)
PPV_thresh <- c(8,9,6,10)
ART_MBP <- c(54,65,69,70)
Age <- c(22,47,87,4)
Sex <- c(1,0,0,1)
Asa <- c(6,4,1,4)
Eme_ope <- c(1,0,0,1) #emergency operation
BMI <- c(32,16,29,15)
data.frame(Case, mortality,PPV_thresh, ART_MBP, Age, Sex, Asa, Eme_ope, BMI)Case <- c(29,600,1009,1900)
mortality <- c(0,1,1,0)
PPV_thresh <- c(8,9,6,10)
ART_MBP <- c(54,65,69,70)
Age <- c(22,47,87,4)
Sex <- c(1,0,0,1)
Asa <- c(6,4,1,4)
Eme_ope <- c(1,0,0,1) #emergency operation
BMI <- c(32,16,29,15)
data.frame(Case, mortality,PPV_thresh, ART_MBP, Age, Sex, Asa, Eme_ope, BMI)## Case mortality PPV_thresh ART_MBP Age Sex Asa Eme_ope BMI
## 1 29 0 8 54 22 1 6 1 32
## 2 600 1 9 65 47 0 4 0 16
## 3 1009 1 6 69 87 0 1 0 29
## 4 1900 0 10 70 4 1 4 1 15
Forklaringer: Case: - Case skal sorteres således at vi kun finder de operationer der er af typen: “open”, “general surgery” og “general Anathesia”. Dette kræver en udvidet load funktion. Der burde være lidt under 3000 observationer
PPV_thresh: - Hvis det så er 180 minutter skal vi fitte 180 gams og udtrække de 180 pulstryk variationer (som er den der amplitude/gennemsnit for pulstryk). Vi vil gerne vide hvor meget af tiden PPV ligger uden for intervallet (6,12) (domain knowledge). Der er lidt forskellige måder dette kan laves på. Men det skal ende ud i enten et samlet tal, eller måske to kolonner med andel af PPV_high, PPV_low. - Vi kan dnytte “anestart” og “aneend” fra clinical information API’en til kun at vælge det interval af operationen hvor patienten er i narkose.
ART_MBP (lav prioritet) : - ART_MBP (mean arterial pressure) er faktisk en del af SOLAR8000, så måske skal vi også begrænse vores cases til det? - Vi er intereserede i hvor ofte ART_MBP ligger under 65 (domain knowledge), så vi skal bare finde andelen af værdierne i ART_MBP under 65
Age, sex, asa, Eme_ops og BMI: - Disse punkter er bare nogle confounders vi skal have med, de ligger i clinical information API’en
Konkrete opgaver (prioriteret): 1. Opgrader load funktion så den kan loade ud fra kriterier i clinical information API - Høj prioritet. Tror jeg (Georg) tager den da jeg har skrevet det meste af load funktionaliteten.
2.Lav en funktion der tager et langt x antal minutter og outputter x PPV-værdier - Høj prioritet
- Lav funktion, der får et case ID og henter: Age, sex, asa, Eme_ops og BMI
- (Mellem prioritet) Dette kræver ikke at opgave 1 er løst, så den kan en af jer bare tage
- Udvid subset funktionen så man kan passe at den kun skal kigge på intervallet mellem “anestart” og “aneend”. ’
- (lav prioritet) Kræver måske at opgave 1 er løst, men ikke sikkert
Generelt til udvikling: - Fra nu af uploader vi ikke flere funktioner til pakken, men vi holder dem i de dokumenter vi skriver i, så samler vi til sidst, så slipper vi for konstant at skulle pushe til master - Ville være bedst hvis vi stoppede med at pushe til master, men istedet havde hver vores branch vi arbejdede i og så mergede. - Lad os holde vignetten som den er nu, og så ellers lave et dokument for hver funktion man arbejder på, hvor man kopierer de ting man skal bruge fra vignetten ind.
Spørgsmål: - Hvis vi skal bruge ART_MBP skal vi vel også restricte vores cases til dem der er optaget på SOLAR8000? - Skal vi definerer PPV udfra et tal eller to tal som beskrevet i beskrivelsen af PPV_threshx